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Abstract 

The dynamics and the stationary states of an exactly solvable three-state layered 
feed-forward neural network model with asymmetric synaptic connections, finite 
dilution and low pattern activity are studied in extension of a recent work on a 
recurrent network. Detailed phase diagrams are obtained for the stationary states 
and for the time evolution of the retrieval overlap with a single pattern. It is shown 
that the network develops instabilities for low thresholds and that there is a gradual 
improvement in network performance with increasing threshold up to an optimal 
stage. The robustness to synaptic noise is checked and the effects of dilution and of 
variable threshold on the information content of the network are also established. 
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1 Introduction 



The statics and the dynamics of large attractor and feed-forward neural networks, 
inspired in features of biological networks, have been studied over some time. 
Some of these features are the asymmetry and the finite dilution of the synaptic 
connections as well as the low activity of the patterns and the neurons. It has been 
suggested that a recurrent attractor neural network model of multi-state neurons 
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with these characteristics may describe the short term memorization performance 
of the CA 3 region of the hyppocampus in both the human brain and in the brain 
of primates, and results of numerical simulations on the retrieval behavior are now 
available [1]. Full results on the dynamic evolution of the retrieval overlap and on 
the phase diagrams for the stationary states of low-activity networks of multi-state 
neurons with asymmetric interactions and finite dilution are still missing. It would 
be desirable to have such results in order to describe other implementations of 
neural network models, among them as devices to account for long-term memory 
in the brain [2]. 

The retrieval behavior and thermodynamic properties of symmetrically diluted 
Q-Ising recurrent neural networks with finite dilution and low activity patterns 
have been studied and full phase diagrams have been obtained for Q = 3 and 
Q — 4 states as well as for a network with continuous response neurons [3]. In the 
case of a discrete number of states, phase boundaries between states of low and 
high performance are found to disappear beyond a finite dilution, allowing for the 
biologically appealing feature of a continuous improvement of the behavior of the 
network without the need of a precise threshold adjustment. The full study of the 
dynamics of these networks is rather involved (see ref. [4]) and for practical, either 
hardware or biological implementations of neural networks, it would be interesting 
to have a simple dynamical system. 

A suitably tractable model to study these issues is a feed-forward layered network 
with no feedback loops, which has an exactly solvable non-trivial dynamics, in 
which the only non-zero synaptic connections are the asymmetric interactions that 
pass information from one layer to the next. Despite the fact that the connectivity 
of the layered network is much lower than that of a recurrent network, the presence 
of non-trivial correlations makes it an ideal model to test the qualitative behavior 
of feature dependence in recurrent networks. The study of the model in itself is also 
of interest in view of the practical applications of feed-forward layered networks. 

The purpose of the present work is to present new results on the retrieval perfor- 
mance, the information content and the dynamic evolution for this network with 
three-state Ising neurons and finite dilution. Results for the diluted layered net- 
work with binary units and for the three-state network with no dilution can be 
found in the literature [5,6]. 

The outline of the paper is the following. In Section 2 we introduce the model and 
the relevant macroscopic variables. In Section 3 we derive the recursion relations 
for these variables that establish the dynamics for the model. We discuss the results 
for the phase diagrams of the stationary states and the dynamical evolution of the 
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retrieval overlaps in Section 4, and end with concluding remarks in Section 5. 



2 The Model 

The network model consists of L layers with N neurons on each layer, that can 
take values a\, I = l,...,L;i = 1,...,N from the set S = {— 1,0, +1}, where ±1 
denote the active states. A macroscopic number of p — acN ternary patterns, 
where c is the probability that two neurons are connected, is taken from a set of 
independent identically distributed random variables {£f' z = 0,±1}, ji = l,...,p, 
with the probability distribution 



which is assumed to be the same for every layer and where ±1 are the active 
patterns. The mean of each pattern (over-lined) is zero and a = (£f' ? ) 2 , denotes 
their activity. A new random set of p patterns is generated on each layer and the 
whole set on two consecutive layers is embedded in the diluted network according 
to the generalized Hebbian learning rule 



where {c\j} is a set of independent identically distributed random variables that 
account for the dilution of the synapses, particularly when the patterns are active, 
and such that c^- = 1 with probability c and zero with probability 1 — c, for all I. 
Thus, cN is the mean number of neurons connected to each neuron. For the fully 
connected layered network, c = 1, and the case of extreme dilution corresponds 
to the limit c — > 0, after taking the macroscopic N — > oo limit. The dilution 
introduces an additional randomness into the dynamics of the fully connected 
network in the form of a static noise z\j/y/~N with mean zero and variance a(l — 
c)/N [5]. There is no contribution to the learning rule from patterns in the same 
layer. 

The states of the network change as follows. Given a configuration on the first 
layer, a l N = {crj}, j = 1, N, the state a\ +1 of unit % on layer I + 1 is determined 
exclusively by the configuration cr l N of the units in the previous layer according to 
the stochastic law 



Prob(^)=a<5(|# 



H,l\2 



1) + (1 - a)6(&*) 



(1) 




Si Sj ) 



(2) 



Prob(^ 



i+i — 



s e S\a l N ) 



exp[-/3e i (s|<r{ v )] 



(3) 



Es&S e ^P[-^i(s\(T l N )} 



3 



in terms of the single-site energy function on that unit 

e i ( 8 \o* N ) = - 8 h? 1 (o i If ) + ef 1 * 1 , (4) 

where 

^ + V*) = E44 ( 5 ) 

3=1 

is the acting local field and 6*- +1 is a local externally adjustable threshold param- 
eter on layer I + 1. Since the only changes in the network, in both the synaptic 
connections and the states of the units, are between units in consecutive layers 
one may associate the layer index with a discrete time step t and we do this in 
the following. Thus, the evolution of the network proceeds according to a parallel 
dynamics in which the states of all neurons are updated at each time step. 

Next, we consider the relevant quantities that describe the performance of the 
network. First, the retrieval overlap between the state of the network and the 
pattern {£f '*} at time (layer) t, given by 

3 

which depends on the site i. The dynamical activity and the activity overlap are 
defined as 

?S = ^E(^) 2 . ^ = ^E(W, (7) 

i i 

respectively. The latter will only be needed for the condensed pattern, that is the 
stored pattern to be retrieved. 

For the mutual information, we need the conditional probability distribution Prob(cr*|£f'*) 
that a neuron i is in the state a\ on layer t given that the i-th bit of the condensed 
pattern is £f As a consequence of the independence of the states of the units on 
a given layer, it is sufficient to consider the distribution for a single typical neuron, 
and we omit here the index %. We also omit here, for clarity, the time index and 
use [7] 

ProbH^) = { Si + m^)5(cr 2 - 1) + (1 - s^5(a), (8) 

where 

st = s »+nef, s » = *l^, P = !^*. ( 9 ) 

x d x (x 

The mutual information between patterns and neurons, regarding the patterns as 
the inputs and the neuron states as the output of the network channel on each 
layer, is an architecture independent property given by [8,9] 



I^(a : e) = S(a)-S(a\^) : (10) 
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where 

S(a) = -g ln(g /2) - (1 - q ) ln(l - q ) 



(11) 



is the entropy and S(cr\^) = aS a + (1 — a)Si- a is the equivocation term with 

S a = -c% In 4 - dL In ct - (1 - n' 1 ) ln(l - n") 
5i_ a = -s^ ln(s^/2) - (1 - s") ln(l - s' 1 ). (12) 

Here, c± = (n^ ± m M )/2 and s' 1 is the parameter in the conditional probability 
Prob(o"|^ M ). The mutual information can then be used to obtain the information 
content of the network, = I^a, where a = p/cN is the storage ratio. 



3 Recursion Relations 

We consider in this work the retrieval of a single (condensed) pattern, say in 
the dynamics of the network with a finite overlap mf" = 0(1) and the remaining 
/j, > 1 overlaps rhf '* = 0(1/ V~N). The interest is in the mean overlap, m* = [(m*'*)], 
where [...] denotes the average over the Qj, < ... > denotes the thermal average 
with Eq.(3) and the bar means the average over the patterns. We also need the 
activity overlap with pattern and take n* = [(n]'*)]. 

The recurrence relations that describe the dynamics of the network for large N 
follow from the local field, Eq.(5), written as 

h? 1 = £ t+ V + zA* , (13) 

where the average condensed overlap m* = (a 1 )^ 1 * 1 /a depends on the local field at 
time t — 1 and z is a Gaussian random variable with zero mean and unit variance 
that comes from the action of the macroscopic number of random overlaps of the 
diluted network with the uncondensed patterns and the use of the central limit 
theorem. The layer- dependent variance of the local field becomes site independent 
and is given by 

(AT = «EKKF)I • (1 4 ) 

n>i 

A direct calculation in the large- N limit yields 

(A*) 2 = a(l - cH + (A*) 2 , (15) 
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where a = p/cN, q$ = ((o"*) 2 )o-|£ * s ^ ne dynamical activity and 



(A< ) 2 = a£<K) 2 > (16) 

is now the variance of the Gaussian noise for the connected layered network in 
terms of the overlap with the uncondensed patterns 

K = ^Y,^\. (it) 

Thus, the noise in the local field is a superposition of two Gaussian noises, one 
due to the dilution of the synaptic connections and the other to the macroscopic 
number of uncondensed patterns, in extension of an earlier result for the binary 
network [5]. 

Since our main interest in this work is in the effects of synaptic dilution in a low- 
activity network, we take a uniform and time-independent threshold 9\ = 9. The 
averages (a 1 ) and ((cr 1 ) 2 ) are then given, respectively, by 

F (h t m _ sinh(^) _ cosh(/?#) 

which, in the zero temperature limit, (5 — > oo, become 

F 00 = sign(h t )Q(\h t \-9) , G OQ = e(\h t \-9) , (19) 

where O(x) is the usual step function. The performance with a self-adjusting time- 
dependent threshold has been considered in a fully connected layered network [10] 
and in other architectures [10-14]. Exact dynamic equations are then obtained in 
the large N limit in the form of recursion relations for to*, and n*, where the 
latter two are needed for the information content of the network. Similarly, an 
exact recursion relation for the second term in the width of the stochastic noise, 
Eq.(15), is obtained in the form 

(A^ 2 = «?$ + (**) W , (20) 

where 

x* = P(qo - q\) (21) 

is the susceptibility in which q\ = (cr*) 2 . Thus we obtain 
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(22) 

a) GpizA* , 6)} (23) 
a)rf(zA\e)} (24) 

where, as usual, Dz = exp(—z 2 /2)dz/\/27r. We also have 

n t+1 = j Dz Gpirn 1 + zA l , 9) . (25) 

The dynamics, including transients, and the stationary states of the diluted layered 
network follows then from the solutions of these equations together with Eq.(20). 
The stationary states are reached when m t+l = m', q$ +1 = q f , q{ +1 = q\, n t+1 = n* 
and Aq +1 = Aq, and we call the first three, respectively, to, q and q±. The phase 
diagrams for the stationary states and the time evolution of the order parameters 
are discussed in the next section. 



rn 



t+i 



4 +1 



= j Dz Ffiim 1 + zA l , 9) 
= J Dz {a Gpim 1 + zA l , 9) + (1 - 
= J Dz {a F|(m* + zA f , 9) + (1 - 



4 Dynamics and stationary states 



The stable stationary states yield one or more retrieval phases R(m > 0, q± > 0) 
and a spin-glass phase SG(m — 0, q\ > 0), all as sustained activity solutions with 
q > 0. Since the model is a dynamical one, the stability criterium is that the 
change in the order parameters should become smaller in every one of the final 
steps of the iteration procedure of the flow towards an attractor fixed point. Thus, 
for the retrieval overlap one has 

5m(t + l) 

lim , \ ' < 1 , 26 
6m(t)->0 5m(t) 

and similar relationships for the other parameters. 

We are interested in this paper in the characteristic features of finite dilution of 
the phase diagrams and in the specific performance of the network. Different kinds 
of phase diagrams are obtained depending on the pattern activity a and on c. In 
the case of full connectivity (c = 1) and low a, we find the (a, 9) phase diagram 
shown in Fig. 1 for a = 0.5 and either T = (full lines) or T = 0.05 (dashed 
lines). The lines represent discontinuous phase boundaries where the locally stable 
retrieval states appear with decreasing load a below a critical a c and the dotted line 
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Fig. 1. Storage capacity vs. threshold (a, 6) phase diagram for the fully connected Q = 3 
Ising layered feed-forward network, with activity a = 0.5, T = (full lines) and T = 0.05 
(dashed lines). The phases are described in the text and the dotted line is the locus of 
optimal performance. 

indicates the locus of optimal performance which yields the largest load capacity 
that still sustains retrieval behavior. There is a weak retrieval phase (I) and a 
strong retrieval phase (II) separated by a discontinuous phase boundary. There 
is also a smaller retrieval phase in the lower triangular region. For larger activity 
and full connectivity, as in the case of uniform patterns (a = 2/3), there is mostly 
a single retrieval phase with a strong a dependent optimal performance and this 
feature remains even for finite dilution, say for c = 0.5. 

A different situation appears for both finite dilution, with c smaller than a criti- 
cal activity-dependent c*, and for low activity where part of the phase boundary 
between regions I and II disappears. For a = 0.5 this is the case when c* = 0.87 
at a = 0.34 and T = 0. As shown in Fig. 2, for typical c = 0.8 and a = 0.5, 
there is now a gradual transition from region I to region II and, as c decreases, the 
continuous transition region increases until a stage is reached where the maxima 
in both regions merge into a single maximum with a lower optimal threshold 9. 
The phase diagram in Fig. 2 is reminiscent of the diagram for the finite diluted 
recurrent Q = 3 state network with uniform patterns [3]. 

In order to study the dynamics and the stability of the phases, we consider the 
time evolution of the retrieval overlap shown in Fig. 3 for c = 0.8, a = 0.5, 
a = 0.06, T = and 9 between 0.293 and 0.33. For small 9 the retrieval state is 



8 




Fig. 2. (a, 9) phase diagram for the diluted network, with c = 0.8, a = 0.5, for T = 
(full lines) and T = 0.05 (dashed lines). The phases are described in the text. 

not stable as shown by the typical lower curve (0.293), and there is a whole part 
of region I where this is the case. Since it is not clear that these instabilities have 
any significance, we do not explore this issue further in this work. On the other 
hand, as 9 increases, the retrieval state becomes stable and a maximum overlap is 
reached after a relatively short transient. 

A further property of the network is the increase in the maximum information 
content with dilution as shown near to the optimal performance with 9 = 0.5 
for a = 0.5 and various values of c in Fig. 4. Note that the result of a non-zero 
information content for increasing a with dilution is not surprising since we denned 
a = p/cN, but the increase of the maximum information content with dilution is 
a novel feature. 

There is also a considerable increase in the maximum information content with 
an increasing threshold in the good performance region II, as shown in Fig. 5 for 
c = 0.8, a = 0.5 and various values of 9. Also, the decrease from the maximum 
is smoother with dilution than in the case of the fully connected layered network. 
Finally, one may consider the robustness of the network to synaptic noise and in 
Fig. 6 we show the maximum information content, i max for a = 0.5, 9 = 0.5 and 
various values of c. Clearly, there is a more gradual decrease in performance with 
T for lower connectivity. 
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Fig. 3. Time evolution of the retrieval overlap m for the diluted network with c = 0.8, 
a = 0.5, T = and a = 0.06, for 9 = 0.293, 0.30, 0.31, 0.32 and 0.33, from bottom 
to top. Note the instability for large t in the first case, and the transients for the other 
cases. 




Fig. 4. Information content i = la for a = 0.5, = 0.5 and c = 1 (fully connected 
layered network), c = 0.8, 0.6 and 0.4, from bottom to top. 
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Fig. 5. Information content i for a = 0.5, c = 0.8 and threshold 6 = 0.30, 0.32, 0.34 and 
0.36, from bottom to top. The dotted parts of the lines indicate unstable states. 




Fig. 6. Maximum information content i ma x as a function of the synaptic noise T for 
a = 0.5, 6 = 0.5 and c = 1 (fully connected layered network), 0.5, 0.25 and (extremely 
diluted network, from bottom to top. 
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5 Summary and conclusions 



We derived the exact recursion relations in the large-iV limit that describe the 
time evolution and the phase diagrams for the stationary states of the macroscopic 
variables in a three-state layered feed-forward network with finite synaptic dilution. 
Synaptic dilution appears as a static stochastic noise for a macroscopic number 
of stored patterns [5,15]. This is a model with asymmetric interactions between 
units in consecutive layers and we allow for variable pattern activity in training the 
network with ternary patterns. Instead of discontinuous phase boundaries between 
retrieval and non-retrieval states, or between qualitatively different retrieval states 
for full synaptic connectivity, we find that there can be a continuous change with 
no boundary at all from weak to optimal retrieval states with a varying threshold in 
the local field, for low pattern activity. Phase boundaries of continuous transitions 
are known to appear in extremely diluted symmetric or asymmetric networks, but 
we emphasize that the continuous changeover in the present model is due to the 
joint action of finite dilution and low activity patterns. 

In view of a similar recent result for a three-state recurrent diluted network with 
symmetric synaptic connections [3], one may conclude that this is a feature of 
finite dilution which is independent of both the network architecture and of the 
interaction symmetry. Thus, provided there is an above minimum threshold such 
that a network attains the ability to retrieve a nominated pattern after eliminating 
undesirable transient states, the good retrieval performance does not depend in an 
essential way on a precise threshold adjustment and this may explain why biological 
networks, in which there are no precise thresholds, can have a good performance 
despite a fraction of missing synaptic connections. This is an activity dependent 
property and it might help in the study of plasticity in neural networks. It may 
also be useful for artificial neural networks. 

It may be worthwhile to note the asymmetric dual role of the threshold dependence 
discussed in this work. Whereas there is a continuous improvement in network per- 
formance for low to moderate threshold, there is an abrupt end to the performance 
for large threshold, as one would expect, since in the latter case mostly the inactive 
states of the network become dominant. We also showed that there is an improve- 
ment with finite dilution in both the size of the information content transmitted 
by the network and in the continuous changes of the information with a varying 
threshold. That is, as long as there is a convergence to stable stationary states, here 
again the network performance seems to be less sensitive to threshold adjustment. 

The network behavior discussed in this work is also expected to appear in a diluted 
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layered Q = 4 state network with low activity patterns, based on recent results on 
a recurrent network which exhibits a continuous improvement in network perfor- 
mance with varying low-to-moderate threshold for low activity patterns [3]. 
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